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Abstract. The first separation between quantum polynomial time and classical bounded-error polynomial time was 
due to Bernstein and Vazirani in 1993. They first showed a O(l) vs. (2{n) quantum-classical oracle separation based 
on the quantum Hadamard transform, and then showed how to amplify this into a time quantum algorithm 

and a n n( - log n ' classical query lower bound. 

We generalize both aspects of this speedup. We show that a wide class of unitary circuits (which we call dispersing 
circuits) can be used in place of Hadamards to obtain a O(l) vs. Q(n) separation. The class of dispersing circuits 
includes all quantum Fourier transforms (including over nonabelian groups) as well as nearly all sufficiently long 
random circuits. Second, we give a general method for amplifying quantum-classical separations that allows us to 
achieve a n ' 1 ' vs. n n( - log separation from any dispersing circuit. 

1 Background 

Understanding the power of quantum computation relative to classical computation is a fundamental ques- 
tion. When we look at which problems can be solved in quantum but not classical polynomial time, we get 
a wide range: quantum simulation, factoring, approximating the Jones polynomial, Pell's equation, estimat- 
ing Gauss sums, period-finding, group order-finding and even detecting some mildly non-abelian symme- 
tries [Sho97,Hal07,Wat01,FIM + 03,vDHI03]. However, when we look at what algorithmic tools exist on a 
quantum computer, the situation is not nearly as diverse. Apart from the BQP-complete problems [AJL06], 
the main tool for solving most of these problems is a quantum Fourier transform (QFT) over some group. 
Moreover, the successes have been for cases where the group is abelian or close to abelian in some way. For 
sufficiently nonabelian groups, there has been no indication that the transforms are useful even though they 
can be computed exponentially faster than classically. For example, while an efficient QFT for the symmet- 
ric group has been intensively studied for over a decade because of its connection to graph isomorphism, it 
is still unknown whether it can be used to achieve any kind of speedup over classical computation [Bea97]. 

The first separation between quantum computation and randomized computation was the Recursive 
Fourier Sampling problem (RFS) [BV97]. This algorithm had two components, namely using a Fourier 
transform, and using recursion. Shortly after this, Simon's algorithm and then Shor's algorithm for factoring 
were discovered, and the techniques from these algorithms have been the focus of most quantum algo- 
rithmic research since [Sim97,Sho97]. These developed into the hidden subgroup framework. The hidden 
subgroup problem is an oracle problem, but solving certain cases of it would result in solutions for factor- 
ing, graph isomorphism, and certain shortest lattice vector problems. Indeed, it was hoped that an algorithm 
for graph isomorphism could be found, but recent evidence suggests that this approach may not lead to 
one [HMR+06]. As a way to understand new techniques, this oracle problem has been very important, and 
it is also one of the very few where super-polynomial speedups have been found [IMS01,BCvD05]. 

In comparison to factoring, the RFS problem has received much less attention. The problem is defined 
as a property of a tree with labeled nodes and it was proven to be solvable with a quantum algorithm 
super-polynomially faster than the best randomized algorithm. This tree was defined in terms of the Fourier 
coefficients over Z^. The definition was rather technical, and it seemed that the simplicity of the Fourier 
coefficients for this group was necessary for the construction to work. Even the variants introduced by 
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Aaronson [Aar03] were still based on the same QFT over 7^, which seemed to indicate that this particular 
abelian QFT was a key part of the quantum advantage for RFS. 

The main result of this paper is to show that the RFS structure can be generalized far more broadly. In 
particular, we show that an RFS-style super-polynomial speedup is achievable using almost any quantum 
circuit, and more specifically, it is also true for any Fourier transform (even nonabelian), not just over Z?> . 
This illustrates a more general power that quantum computation has over classical computation when using 
recursion. The condition for a quantum circuit to be useful for an RFS-style speedup is that the circuit be 
dispersing, a concept we introduce to mean that it takes many different inputs to fairly even superpositions 
over most of the computational basis. 

Our algorithm should be contrasted with the original RFS algorithm. One of the main differences be- 
tween classical and quantum computing is so-called garbage that results from computing. It is important in 
certain cases, and crucial in recursion-based quantum algorithms because of quantum superpositions, that in- 
termediate computations are uncomputed and that errors do not compound. The original RFS paper [BV97] 
avoided the error issue by using an oracle problem where every quantum state create from it had the exact 
property necessary with no errors. Their algorithm could have tolerated polynomially small errors, but in this 
paper we relax this significantly. We show that even if we can only create states with constant accuracy at 
each level of recursion, we can still carry through a recursive algorithm which introduces new constant-sized 
errors a polynomial number of times. 

The main technical part of our paper shows that most quantum circuits can be used to construct sepa- 
rations relative to appropriate oracles. To understand the difficulty here, consider two problems that occur 
when one tries to define an oracle whose output is related to the amplitudes that result from running a circuit. 
First, it is not clear how to implement such an oracle since different amplitudes have different magnitudes, 
and only phases can be changed easily. Second, we need an oracle where we can prove that a classical al- 
gorithm requires many queries to solve the problem. If the oracle outputs many bits, this can be difficult or 
impossible to achieve. For example, the matrix entries of nonabelian groups can quickly reveal which rep- 
resentation is being used. To overcome these two problems we show that there are binary-valued functions 
that can approximate the complex-valued output of quantum circuits in a certain way. 

One by-product of our algorithm is related to the Fourier transform of the symmetric group. Despite 
some initial promise for solving graph isomorphism, the symmetric group QFT has still not found any 
application in quantum algorithms. One instance of our result is the first example of a problem (albeit a rather 
artificial one) where the QFT over the symmetric group is used to achieve a super-polynomial speedup. 



2 Statement of results 

Our main contributions are to generalize the RFS algorithm of [B V97] in two stages. First, [B V97] described 
the problem of Fourier sampling over Zg, which has an 0(1) vs. Q{n) separation between quantum and 
randomized complexities. We show that here the QFT over 7LV^ can be replaced with a QFT over any group, 
or for that matter with almost any quantum circuit. Next, [BV97] turned Fourier sampling into recursive 
Fourier sampling with a recursive technique. We will generalize this construction to cope with error and to 
amplify a larger class of quantum speedups. As a result, we can turn any of the linear speedups we have 
found into superpolynomial speedups 

Let us now explain each of these steps in more detail. We replace the 0(1) vs Q{n) separation based on 
Fourier sampling with a similar separation based on a more general problem called oracle identification. In 
the oracle identification problem, we are given access to an oracle O a ■ X — > {0, 1} where a G A, for some 
sets A and X with log \A\, log \X\ = Q(n). Our goal is to determine the identity of a. Further, assume that 
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we have access to a testing oracle T a : A — > {0, 1} defined by T a (a!) = S aja >, that will let us confirm that 
we have the right answer. 3 

A quantum algorithm for identifying a can be described as follows: first prepare a state \ip a ) using 
q queries to O a , then perform a POVM {LJ a i} a / e A (with J2 a ' H a ' < I to allow for the possibility of a 
"failure" outcome), using no further queries to O a . The success probability is (ip a \IJ a \ip a }. For our purposes, 
it will suffice to place a 17(1) lower bound on this probability: say that for each a, (ip a \IJ a \ip a ) > 5 for some 
constant 5 > 0. On the other hand, any classical algorithm trivially requires > log(|^4|<5) = Q(n) oracle calls 
to identify a with success probability > 5. This is because each query returns only one bit of information. In 
Theorem 9 we will describe how a large class of quantum circuits can achieve this 0(1) vs. J?(n) separation, 
and in Theorems 1 1 and 12 we will show specifically that QFTs and most random circuits fall within this 
class. 

Now we describe the amplification step. This is a variant of the [BV97] procedure in which making 
an oracle call in the original problem requires solving a sub-problem from the same family as the original 
problem. Iterating this £ times turns query complexity q into q e ^\ so choosing £ = 0(\ogn) will yield 
the desired polynomial vs. super-polynomial separation. We will generalize this construction by defining an 
amplified version of oracle identification called recursive oracle identification. This is described in the next 
section, where we will see how it gives rise to superpolynomial speedups from a broad class of circuits. 

We conclude that quantum speedups — even superpolynomial speedups — are much more common than 
the conventional wisdom would suggest. Moreover, as useful as the QFT has been to quantum algorithms, 
it is far from the only source of quantum algorithmic advantage. 

3 Recursive amplification 

In this section we show that once we are given a constant versus linear separation (for quantum versus 
classical oracle identification), we are able to amplify this to a super-polynomial speedup. We require a 
much looser definition than in [BV97] because the constant case can have a large error. 

Definition 1. For sets A,X, let f : A x X — ► {0, 1} be a function. To set the scale of the problem, let 
\X\ = 2 n and \A\ = 2 n ^ n \ Define the set of oracles {O a : a £ A} by O a {x) = f(a,x), and the states 

\ip a ) = | YlxGx(~^)^ a ' X ^\ x )- The single-level oracle identification problem is defined to be the task of 

y\ x \ 

determining a given access to O a . Let U be a family of quantum circuits, implicitly depending on n. We say 
that U solves the single-level oracle identification problem if 

\{a\U\ Va )\ 2 > Q(l) 

for all sufficiently large n and all a G A. In this case, we define the POVM {LT a } ae A by LJ a = W \a) (a\ U. 

When this occurs, it means that a can be identified from O a with £2(1) success probability and using a 
single query. In the next section, we will show how a broad class of unitaries U (the so-called dispersing 
unitaries) allow us to construct / for which U solves the single-level oracle identification problem. There 
are natural generalizations to oracle identification problems requiring many queries, but we will not explore 
them here. 

Theorem 2. Suppose we are given a single-level oracle problem with function f and unitary U running in 
time poly(n). Then we can construct a modified oracle problem from f which can be solved by a quantum 

3 This will later allow us to turn two-sided into one-sided error; unfortunately it also means that a non-deterministic Turing 
machine can find a with a single query to T a . Thus, while the oracle defined in BV is a candidate for placing BQP outside PH, 
ours will not be able to place BQP outside of NP. This limitation appears not to be fundamental, but we will leave the problem 
of circumventing it to future work. 
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computer in polynomial time (and queries), but requires n 
succeeds with probability \ + n~°^ ogn \ 



i?(log n) 



queries for any classical algorithm that 



We start by defining the modified version of the problem (Definition 3 below), and describing a quantum 
algorithm to solve it. Then in Theorem 4 we will show that the quantum algorithm solves the problem 
correctly in polynomial time, and in Theorem 6, we will show that randomized classical algorithms require 
superpolynomial time to have a nonnegligible probability of success. 

The recursive version of the problem simply requires that another instance of the problem be solved in 
order to access a value at a child. Figure 1 illustrates the structure of the problem. 



Fig. 1. A depth k node at location x = (xi, . . . , Xk) is labeled by its secret s x and a bit b x . The secret s x can be computed from 
the bits by of its children, and once it is known, the bit b x is computed from the oracle 0(x, s x ) = b x . If a; is a leaf then it has no 
secret and we simply have b x = O(x). The goal is to compute the secret bit b® at the root. 

Using the notation from Figure 1, the relation between a secret s x , and the bits b y of its children is given 
by by = f(s x ,x'), where / is the function from the single-level oracle identification problem. Thus by 
computing enough of the bits b yi , b y2 , . . . corresponding to children y\ , yi , . . . , we can solve the single-level 
oracle identification problem to find s x . Of course computing the b y will require finding the secret strings 
s y , which requires finding the bits of their children and so on, until we reach the bottom layer where queries 
return answer bits without the need to first produce secret strings. 

Definition 3. A level-t recursive oracle identification problem is specif ed by X, A and f from a single-level 
oracle identification problem (Definition 1 ), any function s:|UlUlxIU...U X^~ l — > A, and any 
final answer b§ £ {0, 1}. Given these ingredients, an oracle O is defined which takes inputs in 





k=0 
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and to return outputs in {0, 1, FAIL}. On inputs x\, . . . , x k G X, a G A with 1 < k < £, O returns 

0(x 1 ,...,x k ,a) = f(s(x 1 ,...,x k -i),x k ) whena = s(x 1 ,...,x k ) (1) 

0(x±, . . . , Xk, a) = FAIL when a ^ s(x±, . . . ,x k ). (2) 

Ifk = 0, then O(a(0)) = 6 and 0(a) = FAIL if a ^ s (0). fc = £ 

...,x e ) = f{s{xi, x e -i),x e ). 

The recursive oracle identification problem is to determine b% given access to O. 

Note that the function s gives the values s x in Figure 1. These values are actually defined in the oracle 
and can be chosen arbitrarily at each node. Note also that the oracle defined here effectively includes a 
testing oracle, which can determine whether a = s{x\, . . . , x k ) for any a G A, x±, . . . , x k G X with one 
query. (When x = (xi, . . . ,Xfc), we use s(x\, . . . ,x k ) and s x interchangeably.) A significant difference 
between our construction and that of [BV97] is that the values of s at different nodes can be set completely 
independently in our construction, whereas [BV97] had a complicated consistency requirement. 

The algorithm. Now we turn to a quantum algorithm for the recursive oracle identification problem. 
If a quantum computer can identify a with one-sided 4 error 1 — 8 using time T and q queries in the non- 
recursive problem, then we will show that the recursive version can be solved in time 0((q log ^ S ) e T). For 

concreteness, suppose that \tp a ) = } ^2 x( zx(~ l)^ a ' x ^ |x), so that q = 1; the case when q > 1 is an 

V \ x \ 

easy, but tedious, generalization. Suppose that our identifying quantum circuit is U, so a can be identified 
by applying the POVM {iI a '} a 'eA with Yl a , = W \a')(a f \ U to the state \(p a ). 

The intuitive idea behind our algorithm is as follows: At each level, we find s(xi, . . . , x^) by recur- 
sively computing s(x\, . . . ,Xk+i) for each x^+i (in superposition) and using this information to create 
many copies of \<p s (xi,...,x k ))> f rom which we can extract our answer. However, we need to account for the 
errors carefully so that they do not blow up as we iterate the recursion. In what follows, we will adopt the 
convention that Latin letters in kets (e.g. |a), \x), . . .) denote computational basis states, while Greek letters 
(e.g. \0, \ip), . . .) are general states that are possibly superpositions over many computational basis states. 
Also, we let the subscript ^ indicate a dependence on (xi, . . . ,Xk). The recursive oracle identification 
algorithm is as follows: 

Algorithm: FIND 

Input: |xi, . . . , Xfc)|0) for k < I 

Output: an.) = s(xi, . . . , x&) up to error e = (8/8) 2 , where 5 is the constant from the oracle. This means 

|Xl, . . . , x^ 

> v 71 ~ e W °)l a (fc))K(fc)> + ^/W)\ 1 )\ ( >(k))\ ' where £ (k) < £ and K(fe)) and \(' {k) ) are arbitrary. 
(We can assume this form without loss of generality by absorbing phases into |C(fe)) an d |C(fc))-) 

1. Create the superposition -±= Y, Xk+1 ^x \ x k+i)- 

2. If k + 1 < £ then let a^+i) = FIND(xi, . . . , xu+i) (with error < e), otherwise a^+i) = 0- 

3. Call the oracle 0(x\, . . . ,Xk+i,a^ k+1 )) to apply the phase (-lj/OOiv^fc^fc+i) using the key a^ k+1 y 

4. If k + 1 < £ then call FIND^ to (approximately) uncompute a^+i)- 

5. We are now left with \(f>{k)), which is close to \ip s ( Xl ,...,x k ))- 
Repeat steps 1^1 m = | In | times to obtain \(p( k ))® m 

6. Coherently measure {n a } on each copy and test the results (i.e. apply U, test the result, and apply U^). 

7. If any tests pass, copy the correct a^ k ) to an output register, along with |0) to indicate success. 



4 



One-sided error is a reasonable demand given our access to a testing oracle. Most of these results go through with two-sided 
error as well, but for notational simplicity, we will not explore them here. 
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Otherwise put a |1) in the output to indicate failure. 
8. Let everything else comprise the junk register | ("(&))• 



Theorem 4. Calling FIND on |0) solves the recursive oracle problem in quantum polynomial time. 

Proof. The proof is by backward induction on k; we assume that the algorithm returns with error < e for 
k + 1 and prove it for k. The initial step when k = i is trivial since there is no need to compute ag+i, and 
thus no source of error. If k < i, then assume that correctness of the algorithm has already been proved for 
k + 1. Therefore Step 2 leaves the state 



\X\ 



E \ x k+l) vA - £(fe+l)|0)l«(fe+l))IC(fc+l)) + y/ £ (k+l)\ 1 )\C[k+l)) 



In Step 3, we assume for simplicity that the oracle was called conditional on the success of Step 2. This 
yields 



W {k) ) ■■= j?=? E l**+i> [(-l) /( °W >Xfc+i yi - e (fc+ i)|0)|a (fc+1) )|C( fc+ i)) + y^Tiy|l)|C ( ' fe+ i)) 



Now define the state IV'(fc)) by 
1 



x k+1 ex 



^ {h)):= ^X\ S (-l) /(w+1 Wi> Vl-e(fc+i)|0>|a( fc+ i))|C( fc+ i)) + v^^|l)|C( fc+ i)) 



Note that 



(^(fc)IV'(fc)) = 4[ E (* " £ (*+i) + (-l) /(a(fc) ' Xfc+l) e (fe+ i)) • 



This quantity is real and always > 1 — 2e^+i) > \/l — 4e by the induction hypothesis. Let 



\4>(k)) 



1 E (-i) /(w+i Wi>io>. 



1*1 



x k+1 ex 



Note that FIND^jxi, . . . , x k , V'(fe)) = l^i, ■ ■ ■ j ^fc> 4>(k)}- Thus there exists £( fc ) such that applying FIND"!" 
to \xi,.. . ,x fc )|^( fe )) yields 



x 1 ,...,x k )fg) yjl - 4e (fe )|0 (fc )> + ^wl^fc)) 



where <0( fc )|^ (fc) ) 

We now want to analyze the effects of measuring {TI a } when we are given the state 



and ei k \ < e. 



\<P(k)) : = \fi -4e(fe)|0( fc )> + ^4e {k {k) ) 



instead of \4>( k ))- If we define ||M||i = tr V M^M for a matrix M, then || |v?(fc)) (<£>(&) | — |^(fe))(^(fc)| 111 
4 y ^y [FvdG99]. Thus 

{iP( k) \n a(k) \<p {k) } > {(f) {k) \n aik) \<j) (k) ) -4ye^> -5-4ye^> 5/2. 

In the last step we have chosen e = (5/8) 2 . 
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Finally, we need to guarantee that with probability > 1 — e at least one of the tests in Step 6 passes. After 
applying U and the test oracle to !<£>(&)), we have > y/5/2 overlap with a successful test and < ^1 — 5/2 
overlap with an unsuccessful test. When we repeat this m times, the amplitude in the subspace corresponding 
to all tests failing is < (1 — 5/2) m / 2 < e~ mS ^. If we choose m = (2/5) ln(l/e) = (4/5) ln(8/<5) then the 
failure amplitude will be < yfe, as desired. 

To analyze the time complexity, first note that the run-time is 0(T) times the number of queries made 
by the algorithm, and we have assumed that T is polynomial in n. Suppose the algorithm at level k requires 
Q(k) queries. Then steps 2 and 4 require mQ(k + 1) queries each, steps 3 and 6 require m queries each 
and together Q(k) = 2mQ(k + 1) + 2m. The base case is k = £, for which Q(£) = 0, since there are 
no secret strings to calculate for the leaves. The total number of queries required for the algorithm is then 
Q(0) ~ (2m) 2i . If we choose I = logn the quantum query complexity will thus be n 21og2m = and 
the quantum complexity will be polynomial in n compared with the n n( - logn ^ lower bound. 

This concludes the demonstration of the polynomial-time quantum algorithm. Now we turn to the clas- 
sical r^Oog™) lower bound. Our key technical result is the following lemma: 

Lemma 5. Define the recursive oracle identification problem as above, with a function f : A x X — ► {0, 1} 
and a secret s:|UlUlxIU...U X s ~~ x ^> A encoded in an oracle O. Fix a deterministic classical 
algorithm that makes < Q queries to O. Then if s and ANS are chosen uniformly at random, the probability 
that ANS is output by the algorithm is 

Using Yao's minimax principle and plugging in \A\ = 2 an , £ = log n and Q = n°( logn ' readily yields 

Theorem 6. Iflog\A\ = n Q< ^> and £ = Q(\ogn), then any randomized classical algorithm using Q = 
n o(iogn) g Uer i es w m nave i _|_ n -n(\ogn) probability of successfully outputting ANS. 

Proof (of Lemma 5). Let T = U X U . . . U X s - denote the tree on which the oracle is defined. We say that 
a node x G T has been hit by the algorithm if position x has been queried by the oracle together with the 
correct secret, i.e. 0(s(x),x) has been queried. The only way to find to obtain information about ANS is 
for the algorithm to query with the appropriate secret; in other words, to hit 0. 

For x, y G T we say that x is an ancestor of y, and that y is a descendant of x, if y = x x z for some 
z G T. If z G X then we say that y is a child of x and that x is a parent of y. Now define S C T to be the 
set of all x G T such that x has been hit but none of x's ancestors have been. Also define a function d(x) to 
be the depth of a node x; i.e. for all x G X k , d(x) = k. We combine these definitions to declare an invariant 

z ,^-*> 

xes v 7 

The key properties of Z we need are that: 

1. Initially Z = 0. 

2. If the algorithm is successful then it terminates with Z = 1. 

3. Only oracle queries change the value of Z. 

4. Querying a leaf can add at most (log |^4|/3)~ £ to Z. 

5. Querying an internal node (i.e. not a leaf) can add at most 2/(| A| - 1 / 3 — Q) to E Z, where E indicates the 
expectation over random choices of s. 
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Combining these facts yields the desired bound. 

Properties 1-4 follow directly from the definition (with the inequality in property 4 because it is possible 
to query a node that has already been hit). To establish property 5, suppose that the algorithm queries node 
x G T and that it has previously hit k of x's children. This gives us some partial information about s(x). 
We can model this information as a partition of A into 2 k disjoint sets Ai,..., A 2 k (of which some could 
be empty). From the k bits returned by the oracle on the k children of x we have successfully queried, we 
know not only that s(x) G A, but that s(x) E A{ for some i € {1, . . . , 2 k }. 

We will now divide the analysis into two cases. Either k < ^ log |A| or > | log \A\. We will argue 
that in the former case, \Aj\ is likely to be large, and so we are unlikely so successfully guess s(x), while 
in the latter case even a successful guess will not increase Z. The latter case (k > | log \A\) is easier, so 
we consider it first. In this case, Z only changes if x is hit in this step and neither x nor any of its ancestors 
have been previously hit. Then even though hitting x will contribute (log | A\/3)~ d ^ to Z, it will also 
remove the k children from S (as well as any other descendants of x), which will decrease Z by at least 
fc(Iog |^|/3)- d W _1 > (log \A\/3)- d( > x \ resulting in a net decrease of Z. 

Now suppose that k < | log \A\. Recall that our information about s(x) can be expressed by the fact 
that s(x) € Ai for some i G {1, . . . , 2 fe }. Since the values of s were chosen uniformly at random, we 
have Pr(Aj) = |Aj|/|yl|. Say that a set Ai is bad if \Ai\ < |^| 2/3 /2 fe . Then for a particular bad set A u 
Pr(Aj) < \A\- 1 / 3 2~ k . From the union bound, we see that the probability that any bad set is chosen is 

< |A|-V3. 

Assume then that we have chosen a good set Ai, meaning that conditioned on the values of the children 
there are \Ai\ > |^4| 2 / 3 /2 fc > j^lj 1 / 3 possible values of s(x). However, previous failed queries at x may 
also have ruled out specific possible values of x. There have been at most Q queries at x, so there are > 
l^l 1 / 3 — Q possible values of s(x) remaining. (Queries to any other nodes in the graph yield no information 
on s(x).) Thus the probability of hitting x is < l/dAj 1 / 3 — Q) if we have chosen a good set. We also have a 

< |^4| -1 / 3 probability of choosing a bad set, so the total probability of hitting x (in the k < | log \A\ case) 
is < l^l" 1 / 3 + l/d^l 1 / 3 - Q) < 2/(| A] 1 / 3 - Q). Finally, hitting x will increase Z by at most one, so the 
largest possible increase of E Z when querying a non-leaf node is < 2/(| A) 1 / 3 — Q). This completes the 
proof of property 5 and thus the Lemma. 

4 Dispersing Circuits 

In this section we define dispersing circuits and show how to construct an oracle problem with a constant 
versus linear separation from any such circuit. In the next sections we will show how to find dispersing 
circuits. Our strategy for finding speedups will be to start with a unitary circuit U which acts on n qubits and 
has size polynomial in n. We will then try to find an oracle for which U efficiently solves the corresponding 
oracle identification problem. Next we need to define a state \(p a ) that can be prepared with 0(1) oracle calls 
and has overlap with U^\a). This is accomplished by letting \ip a ) be a state of the form 2~ ra / 2 ±|x). 
We can prepare \ip a ) with only two oracle calls (or one, depending on the model), but to guarantee that 
| (a\ U\ip a ) | can be made large, we will need an additional condition on U. For any a £ A,W\a) should have 
amplitude that is mostly spread out over the entire computational basis. When this is the case, we say that U 
is dispersing. The precise definition is as follows: 

Definition 7. Let U be a quantum circuit on n qubits. For < a, (3 < 1, we say that U is (a, (3)-dispersing 
if there exists a set A C {0, l} n with \A\ > 2 an and 

]T \{a\U\x)\>/32S. (3) 

xG{0,l} n 

for all a £ A. 
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Note that the LHS of (3) can also be interpreted as the L\ norm of W\a). 

The speedup in [BV97] uses U = H® n , which is (U)-dispersing since Y, x \{a\H® n \x)\ = 2 n l 2 for all 
a. Similarly the QFT over the cyclic group is (l,l)-dispersing. 5 Nonabelian QFTs do not necessarily have 
the same strong dispersing properties, but they satisfy a weaker definition that is still sufficient for a quantum 
speedup. Suppose that the measurement operator is instead defined as LJ a = U {\a){a\ (8> I)U^, where a is a 
string on m bits and / denotes the identity operator on n — m bits. Then U still permits oracle identification, 
but our requirements that U be dispersing are now relaxed. Here, we give a definition that is loose enough 
for our purposes, although further weakening would still be possible. 

Definition 8. Let U be a quantum circuit on n qubits. For < a, (3 < 1 and < m < n, we say that U 
is (a, (3)-pseudo-dispersing if there exists a set A C {0, l} m with \A\ > 2 an such that for all a £ A there 
exists a unit vector \ip) £ C 2 such that 

£ \{a\MU\x)\>P2$. (4) 
xe{o,i} n 

This is a weaker property than being dispersing, meaning that any (a, (3) -dispersing circuit is also (a, (3)- 
pseudo-dispersing. 

We can now state our basic constant vs. linear query separation. 

Theorem 9.IfU is (a, (3)-pseudo-dispersing, then there exists an oracle problem which can be solved with 
one query, one use of U and success probability (2f3/ir) 2 . However, any classical randomized algorithm 
that succeeds with probability > 5 must use > an + log 5 queries. 

Before we prove this Theorem, we state a Lemma about how well states of the form 2~ n / 2 e l( ^ x \x) 
can be approximated by states of the form 2~ n / 2 ±\x). 

Lemma 10. For any vector {x\, . . . , x^) £ C d there exists {Q\, ■ ■ ■ ,9d) £ suc h that 

d 

^x k e k 



k=i 



2^, 
> - ]> \ x k\ 



7T 

k=l 



The proof is in the full version of the paper [?]. 

Proof of Theorem 9: Since U is (a, /3)-pseudo-dispersing, there exists a set A C {0, l} m with \A\ > 2 an 
and satisfying (4) for each a G A. The problem will be to determine a by querying an oracle O a (x). No 
matter how we define the oracle, as long as it returns only one bit per call any classical randomized algorithm 
making q queries can have success probability no greater than 2 q ~ an (or else guessing could succeed with 
probability > 2~ an without making any queries). This implies the classical lower bound. 

Given a G A, to define the oracle O a , first use the definition to choose a state \tp) satisfying (4). Then 
by Lemma 10 (below), choose a vector 6 that (when normalized to \6)) will approximate the state U^\a) \tp). 
Define O a {x) so that (-\)°^ x ) = 9 X = 2 n / 2 (x\6). By construction, 

2- n l 2 \(aMU\6)\ >-f3 (5) 

7T 

which implies that creating \0), applying U, and measuring the first register has probability > (2/3/ir) 2 of 
yielding the correct answer a. □ 



5 Another possible way to generalize [BV97] is to consider other unitaries of the form U = A®", for A e Ui. However, it is not 
hard to show that the only way for such a U to be J?(l))-dispersing is for A to be of the form e^'^ffe*"' . 



10 Sean Hallgren and Aram W. Harrow 



5 Any quantum Fourier transform is pseudo-dispersing 

In this section we start with some special cases of dispersing circuits by showing that any Fourier transform 
is dispersing. In the next section we show that most circuits are dispersing. 

The original RFS paper [BV97] used the fact that H® n is (l,l)-dispersing to obtain their starting O(l) 
vs Q{n) separation. The QFT on the cyclic group (or any abelian group, in fact) is also (l,l)-dispersing. In 
fact, if we will accept a pseudo-dispersing circuit, then any QFT will work: 

Theorem 11. Let G be a group with irreps G and d\ denoting the dimension of irrep A. Then the Fourier 
transform over G is (a, 1 / ' \[2)-pseudo-dispersing, where a = (log^A |G| > 1/2. 

Via Theorem 9 and Theorem 2, this implies that any QFT can be used to obtain a superpolynomial quantum 
speedup. For most nonabelian QFTs, this is the first example of a problem which they can solve more quickly 
than a classical computer. 

Proof (Proof of Theorem 11). Let A = {(A, i) : A G G, i G {1, . . . , d\}}. 

Let V\ denote the representation space corresponding to an irrep A G G. The Fourier transform on G 
maps vectors in C[G] to superpositions of vectors of the form |A)|t>i)|t>2) for \v2) G V\. 

Fix a particular choice of A and \i) G V\.lfU denotes the QFT on G then let 

P = U^ (|A)(A|®|i><i|®^l7. 

Define V := supp p, and let Ei^,\ eV - denote an expectation over \tp) chosen uniformly at random from unit 
vectors in V 6 Finally, let 77 be the projector onto V. Note that p = LT/d\ = E \ ip)(tp\. 

Because of the invariance of p under right-multiplication by group elements (i.e. {gi\p\g2} = (dih\p\g2h) 
for all gi, 52, h G G), we have for any g that 

(9\P\9) = J2(9h\p\gh) = tr(p) = . (6) 
Since Ei\i/j)(ip\ = p, (6) implies that 

E \(g\^ = (g\p\g) = -L 

\4>)ev \G\ 

Next, we would like to analyze E | {g\i[>) | 4 . 

E |( 5 |^)| 4 = E tr (\g)(g\ \g)(g\) ■ (|VXV>I ® MM) (V) 

W \'4>) 

= tr (\g)(g\ \g)(g\) (H 77) (8) 

< tr (\g){g\ \g)(g\) ■ (7 + SWAP)(p p) (9) 

= 2({g\p\g)f = (10) 

To prove the equality on the second line, we use a standard representation-theoretic trick (cf. section V.B 
of [PSW06]). First note that \ip)® 2 belongs to the symmetric subspace of V (g> V, which is a £^I^±1). 
dimensional irrep of Ud x - Since E|^ \ip) {tp\® 2 is invariant under conjugation by u u for any u G Ud x , 



6 We can think of \ip) either as the result of applying a Haar uniform unitary to a fixed unit vector, or by choosing from any 
rotationally invariant ensemble (e.g. choosing the real and imaginary part of each component to be an i.i.d. Gaussian with mean 



zero) and setting \ip) = \ip') / \/ (i/>'|V>'). 
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it follows that E|^ \ip)(tp\® 2 is proportional to a projector onto the symmetric subspace of V® 2 . Finally, 
SWAP77® 2 has eigenvalue 1 on the symmetric subspace of V® 2 and eigenvalue — 1 on its orthogonal com- 
plement, the antisymmetric subspace of V® 2 . Thus, 7 + s 2 WAP 77® 2 projects onto the symmetric subspace and 
we conclude that 

E | Wr= (/ + SWAP)(J7®i7) 

m mm d x (d x + i) 

Now we note the inequality 

E|Y| > (EY 2 )i/(Ey 4 )!, (11) 

which holds for any random variable Y and can be proved using Holder's inequality [Ber97]. Setting Y = 
KylV')!, we can bound E|^ KylV')! > 1/ ^2\G\. Summing over G, we find 



e£|W)I>-U^ 



W 9 eG 

Finally, because this last inequality holds in expectation, it must also hold for at least some choice of 
Thus there exists \ip) G V such that 

J2\(gm\>^=V\G\- 

Then U satisfies the pseudo-dispersing condition in (4) for the state with (3 = 1/V2. 

This construction works for each A G G and for \v\) running over any choice of basis of V\. Together, 
this comprises ^agG ^ a vectors i n tne set A. 



6 Most circuits are dispersing 

Our final, and most general, method of constructing dispersing circuits is simply to choose a polynomial-size 
random circuit. We define a length-i random circuit to consist of performing the following steps t times. 

1. Choose two distinct qubits i, j at random from [n]. 

2. Choose a Haar-distributed random U G U^. 

3. Apply U to qubits i and j. 

A similar model of random circuits was considered in [DOP07]. Our main result about these random circuits 
is the following Theorem. 

Theorem 12. For any a, (3 > 0, there exists a constant C such that ifU is a random circuit on n qubits of 
length t = Cr? then U is (a, (3) -dispersing with probability 

>i 

- i _ 2-n(l-a) • 

Theorem 12 is proved in the extended version of this paper [?]. The idea of the proof is to reduce the evo- 
lution of the fourth moments of the random circuit (i.e. quantities of the form E^/ tr U M\W M2U r M^U^ M4) 
to a classical Markov chain, using the approach of [DOP07]. Then we show that this Markov chain has a gap 
of Q(\/n 2 ), so that circuits of length 0(n 3 ) have fourth moments nearly identical to those of Haar-uniform 
unitaries from U2" ■ Finally, we use (1 1), just as we did for quantum Fourier transforms, to show that a large 
fraction of inputs are likely to be mapped to states with large Li-norm. This will prove Theorem 12 and 
show that superpolynomial quantum speedups can be built by plugging almost any circuit into the recursive 
framework we describe in Section 3. 
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A Most circuits are dispersing 

In this Appendix we prove Theorem 12. 

Suppose we start in a computational basis state \a), and after t steps of a random circuit (described in 
Section 6), we have the state \ip t ). Let denote the circuit we have applied, and let tp t denote \ip t ) (ip t \, so 
thattpt = \a)(a\ U.Forp G {0, l,2,3} n let a p denote the tensor product of Pauli matrices a Pl <8> • • -0cr Pn , 
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where {do, a\, 02, 0-3} are the usual single-qubit Pauli matrices {/, a z ,a x , a y }. Then we can expand ij) t in 
the Pauli basis (following [DOP07]) as 

p 

where 7t(p) = 2~ 2 trip t a p . The advantage of this approach is that each 7t(p) is real and X^ p 7* 2 (p) = 1> so 
we can think of {"ft(p)} as a probability distribution onpG {0, 1, 2, 3} n . 

Indeed, by an argument similar to that in [DOP07], we can show how {Ec7 2 (p)} evolves in a way 
that can be described as a Markov chain on {0, 1, 2, 3} n . (Here the expectation is taken over the choice of 
random quantum circuit.) 

Lemma 13. Random quantum circuits are such that {Ec 7 2 (p)} evolve according to the following Markov 
chain on {0,1,2,3}™: 

- Select i 7^ j randomly from [n\ 

- If pi = pj = then do nothing. 

- Otherwise set (pi,Pj) to a random element of {0, 1, 2, 3} 2 \{(0, 0)}. 
Furthermore, this Markov chain satisfies the following properties. 

1. This Markov chain is irreducible and ergodic once we delete the isolated vertex n . 

2. Its stationary distribution (when starting with any physical state) has 7 2 (0 ra ) = 2~ n , but otherwise is 
uniform on p + n , i.e. 7 2 (p) = 4- n /(l + 2~ n )for all p G {0, 1, 2, 3} n . 

3. Its spectral gap is > i?(l/n 2 ). 

4. There exists a constant C such that ift > Cn 3 and if the initial state is a computational basis state then 
E C 7 t 2 (p) < 4- n for all p+ n . 

Before proving the Lemma, we show how it implies Theorem 12. Fix an input \a), apply t random two- 
qubit unitaries as described above to yield the state \ipt), and define Qt := J2 X I^IV't)! 4 - Expanding ^ in 
terms of a p , we obtain 

Q t = 2~ n J2(x\ ^ lt{p)<Jp\x){x\ ^ 1t(q)<Tq\x). 
x pe{0, 1,2,3}™ ge{0,l,2,3}™ 

Now (x\a p \x) will be zero if p contains any 2's or 3's, since each of these lead to bit flips. So we can restrict 
our sum to p's and q's that are strings of 0's and l's (corresponding to / and a z ). Moreover, if p is such a 
string, then (x\a p \x) = (— l) p ' x . Thus 

Qt = 2~ n E 7t(pht(q)Y(-l) (p+<!) - x = E IKP)- 

P e{o,i} n ,q£{o,i} n x pe{o,i}™ 

To bound this sum, we use the last part of Lemma 13 together with 7 2 (0 ra ) = 2~ n to find 

BQ t <2- 2~ n . 

Thus, Markov's inequality implies that 
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Now consider the event that Q t < 2 n / [3 2 . We will use this to lower-bound J2 X K^IV't)!- To do so, 
we define a random variable Y to equal f° r a uniformly random choice of x G {0, 1}™. Then 

E x Y 2 = 2' n , E x Y 4 < f3~ 2 • A~ n , and by (11), E x Y = E x \Y\ > 2~"/ 2 /3. Thus J2 X I <a#t)l > (32 n/2 - 

Putting this together, we see that for any fixed input \a), and for all but a 2[3 2 -fraction of (sufficiently 
long) random circuits, 

£ |<s|C/+|a>|>/32t. 

ai€{0,l} n 

Say that a pair (U, a) is bad when this does not hold. So for any a, the probability over U G C that (U, a) is 
bad is < 2(3 2 . Thus Markov's inequality implies that 

2f3 2 



Pr 



Pr[(U,a) is bad] > 1 - 2^- a > 1 



< 



1 _ 2-(i-«K 



Turning this around, we conclude that a random circuit U G C is (a, j3) -dispersing (meaning that (U, a) 
is good for > 2 an values of a) with probability > 1 - 2/3 2 /(i - 2"( 1 - a )™). Thus, we can obtain a quan- 
tum/classical separation from almost any quantum circuit with uniformly bounded parameters for both the 
quantum upper bound and the classical lower bound. 

Recent work[HL08] has improved the analysis of the Markov chain to show that the gap is Q(l/n) and 
hence that circuits of length 0(n 2 ) are generically dispersing. 

Proof of Lemma 13: The reduction of the quantum random circuit to a classical Markov chain is due to 
[DOP07], but we will present an alternate, shorter proof in order to have a self-contained presentation. 

First, we show that "yt{p) 2 follows a Markov chain. Recall that ipt = 2~ n l 2 7 t (p) cr p an d let W denote 
the random unitary applied at time t + 1. Then ipt+i = WtptW* 1 . Since W acts only on two qubits, we can 
assume for the purpose of this analysis that n = 2, so W 6 U4. Then 

7t+i(p) = ^tr apWikW* = - Yl lt(q)tra p Wa q W^, 

9 6{0,1,2,3} 2 

and 

ii(p) = ^ £ lt(qht(q')tra p Wa q Whva p Wa q/ W^ (12) 

g,g'e{0,l,2,3} 2 

= £ it(qht(q'){p,p\^whQ')- ( 13 ) 

g,g'e{0,l,2,3} 2 

Here adjy is an operator on C 16 defined by (p\ &dw \q) '■= tr a p Wa q W^ /4 for p,q G {0, 1, 2, 3} 2 . When 
we take the expectation over random choices of W, we obtain 

EiiW= £ 7 t(?)7t(9 , )(p,p|(Ead® 2, )| g , g '>. (14) 

g,g'e{0,l,2,3}2 V 7 

Define 

K> = 7ff ^ b)b>- (is) 

V b pe{0,l,2,3} 2 \{(0,0)} 

We claim that E w ad|? = |00)<00| + Since (p,p\0(t\q, q') is equal to 1/15 when p / 00 and 

q = q' 00, an d zero otherwise, we will have 



w 



if p = g = q' = 00 
> / 00 and g = 
otherwise 



B(p,p\ ad^ 2 | g , q') = { i if p / 00 and <? = </ ^ 00 , 
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as claimed in the lemma. 

To prove that Ejy ad^ = |00)(00| + we will use representation theory. Schur's Lemma implies 

that Evk ad^ 2 is a projector onto the invariant subspace of ad^?. For any integers Ai > . . . > A^, we have 
an irrep of Ud, which we call Q^. The conjugate irrep of Q^, obtained by taking the complex conjugate of 
each representation matrix, is given by (Qj)* = Qy, where A' = (— A^, — A^-i, . . . , — Ai). 

The simplest non-trivial Ud representation is the defining representation = C d , where U £ Ud is 
mapped to itself. Here we have dropped trailing zeros, so (1) is equivalent to (1, 0, 0, 0). Now observe that 
W — > ad w is a representation of U4 that is equivalent to <8> (Q^)* — <8> Q( N° w we 
apply the Clebsch-Gordan decomposition of the adjoint representation <g) Q^ -1) t0 obtain 

® 2(o,o,o,-i) - 2(o) © 2(i,o,o,-i)' 

meaning the direct sum of a trivial irrep and a 15-dimensional irrep -i)- ^° ^ n( ^ tne invariant 
subspace of ad^ 2 , we will need to find the invariant subspace of 

(Gfo) © 2? 1)O ,o,-i)) 02 = (Qfo))® 2 © (Qf 0) © Gfx.0,0,-1)) © (6(1,0,0,-1) ® 2 (o)) © (2(i,o,o,-i)) 02 - 

Since QjL is the trivial representation, Q^L <g) QjL is trivial as well, and the projector onto it is given 
by |00)(00|. Next <g) Q( 10 o-i) — 2( 100 _i)> an d thus has no invariant subspace. Similarly for 
2(i -i)©2(o) has no invariant subspace. Qi Q _^ is self-dual, and thus by Schur's Lemma, Ql _ 1 )<8> 
Qfi 00-1) nas a one-dimensional invariant subspace. To determine this subspace we observe that in the basis 
{l^)}pG{o,i,2,3} 2 \{(o,o)}' the representation matrices of Q Q _^ are real. Also = (I(g>^4 T )|£) for 

any matrix A and for |£) defined in (15). Together this means that |£) is an invariant vector in Ql Q _^ ® 

2(i 00-1)' an ^ * n ^ act s P ans its invariant subspace. We conclude that Ejy ad^ 2 = 1 00) (00 1 + | 

We now turn to the analysis of the classical Markov chain. Similar arguments were used in [DOP07] and 
a tighter analysis is forthcoming in [HL08]. First, we claim that the Markov chain is ergodic and irreducible 
outside the vertex n . Irreducibility follows from the fact that every nonzero vertex is connected to l n , while 
the presence of self-loops implies ergodicity. We can verify the stationary distribution from the detailed 
balance condition using a short calculation. 

For the gap we will use the comparison method of Diaconis and Saloff-Coste [DS96]. Consider a Markov 
chain that picks two random sites and replaces them each with random numbers from {0, 1, 2, 3} subject 
only to avoiding the state n . It follows from [DS96, Thm 3.2] that this chain has gap > 2/n, and applying 
the comparison method ([DS96, Thm 3.3]) to the Markov chain described in Lemma 13 yields a lower bound 
of 1/n 2 for its gap. 

To prove the final claim of Lemma 13, we want to choose C sufficiently large so that 

I E 7t 2 (p) - 4- n /(l + 2~ n )| < 2~ 4n (16) 

whenever t > Cn 3 . Note that any computational basis state has overlap exp(— 0(n)) with the stationary 
distribution. Since the gap is Q{l/n 2 ), we can then use standard bounds on the mixing time of Markov 
chains (e.g. [DS96, Lemma 2.8]) to show that (16) holds when t > Cn 3 for some constant C. 



B Proof of Lemma 10 

We want to bound 



max 
e 



^QkXk 



k=i 



max max Re | e iq> } Qk x k 



k=i 
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Here <p is maximized over [0, 2tt], and Re(z) refers to the real part of a complex number z. We can now 
move the maximization over 6 inside the sum to obtain 



max max Re [ e 1 ^ OhXh I = max V max Re ( e^OhXh) 
o \ f-f J <i> j^0 fc e{±i} V / 



k=i 



d d 
max V |Re(e^x fc ) > E V |Re(e^x fc ; 

k=i 



k=l 
d 



E 2. \ x k\'\ cos 1 
d> — J 

V k=l 



7T 



E 

k=i 



Xk\ 



(17) 
(18) 
(19) 



E^ indicates the expectation over 4> chosen uniformly at random from [0, 2ir], and the second to last equality 
uses the rotational invariance of the distribution of d>. □ 



